Data extraction and transformation for the Florianopolis, State in Brazil¶

This notebook is part of the project "New Schools Location in Brazil" and it is focused on the data extraction and transformation for Florianopolis in Brazil.

Step 0. Import libraries and setup environment¶

Step 1. Define Area of Interest (AoI)¶

For this, we will:

  • Download a administrative limits for Brazil, and get the multi-level admistrative limits of Florianopolis usig the geoBR library.
  • Read schools data downloaded using the R package CensoBr.
code_state abbrev_state name_state code_region name_region geometry
0 11.0 RO Rondônia 1.0 Norte MULTIPOLYGON (((-65.38150 -10.42907, -65.37916...
1 12.0 AC Acre 1.0 Norte MULTIPOLYGON (((-71.07772 -9.82774, -71.08554 ...
2 13.0 AM Amazônas 1.0 Norte MULTIPOLYGON (((-69.83766 -3.68659, -69.71230 ...
3 14.0 RR Roraima 1.0 Norte MULTIPOLYGON (((-63.96008 2.47313, -63.96041 2...
4 15.0 PA Pará 1.0 Norte MULTIPOLYGON (((-51.43248 -0.47334, -51.42843 ...

Let's visualize the Santa Catarina State location in Brazil

No description has been provided for this image

Let's see the number of units and visualize each administrative level

Number of units
Intermediate regions 6
Meso regions 7
Micro regions 20
Municipalities 295
No description has been provided for this image
No description has been provided for this image

Since Florianpolis is a municipality its size is perfect to work directly with UrbanPy.

Para we are doing the analysis at the Micro regions level, it could be interesting to analyze the Micro region that contains Florianopolis.

Step 2. Extract and process Census Sociodemographic geographical data¶

Step 2.1. Extracting Census Tracts Data from IBGE¶

Census Tract's Geometries are obtained in this notebook with the GeoBR Python library

Number of census tracts in Florianopolis: 651
code_tract zone code_muni name_muni name_neighborhood code_neighborhood code_subdistrict name_subdistrict code_district name_district code_state geometry
1894 420540705000180 URBANO 4205407.0 Florianópolis Saco Dos Limões 4.205407e+11 4.205407e+10 NaN 420540705.0 Florianópolis 42.0 POLYGON ((-48.53268 -27.59787, -48.52997 -27.5...
1895 420540705000181 URBANO 4205407.0 Florianópolis Córrego Grande 4.205407e+11 4.205407e+10 NaN 420540705.0 Florianópolis 42.0 POLYGON ((-48.51517 -27.60025, -48.51372 -27.6...
1896 420540705000182 URBANO 4205407.0 Florianópolis Córrego Grande 4.205407e+11 4.205407e+10 NaN 420540705.0 Florianópolis 42.0 POLYGON ((-48.51218 -27.60001, -48.51204 -27.6...
1897 420540705000183 URBANO 4205407.0 Florianópolis Córrego Grande 4.205407e+11 4.205407e+10 NaN 420540705.0 Florianópolis 42.0 POLYGON ((-48.51045 -27.59965, -48.51124 -27.6...
1898 420540705000184 URBANO 4205407.0 Florianópolis Córrego Grande 4.205407e+11 4.205407e+10 NaN 420540705.0 Florianópolis 42.0 POLYGON ((-48.51005 -27.59963, -48.50912 -27.5...

Let's visualize the census tracts in Florianopolis to get a sense of their size and granularity

No description has been provided for this image

We can clearly visualize that the census tracts differ in size depeding on the geographical location. For example, near city centers, the census tracts are smaller, while in the peri-urban areas, the census tracts are bigger.

There are some census tracts that seems to be on the ocean, we should handle them.

Now, we will read a dataset generated with the CensoBR R package here. In this dataset we have census variables such as Income and Population by Age that are useful to identify areas where schools are needed.

Number of census tracts obtainer in R for Santa Catarina State: 308993
code_tract V002 V003 pessoa01_V021 pessoa01_V022 pessoa01_V023 pessoa01_V024 pessoa01_V025 pessoa01_V026 pessoa01_V027 ... pessoa01_V044 pessoa01_V045 pessoa01_V046 pessoa01_V047 pessoa01_V048 pessoa01_V049 pessoa01_V050 pessoa01_V051 pessoa01_V052 income_pc
1 354660305000050 7 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 354660305000051 2 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 354670205000001 569 495793.0 14.0 13.0 4.0 7.0 9.0 8.0 8.0 ... 11.0 5.0 6.0 8.0 11.0 10.0 4.0 10.0 3.0 871.340949
4 354670205000002 422 330949.0 9.0 6.0 10.0 7.0 7.0 8.0 10.0 ... 4.0 3.0 4.0 10.0 4.0 8.0 4.0 2.0 3.0 784.239336
5 354670205000003 722 570807.0 15.0 10.0 10.0 12.0 18.0 14.0 17.0 ... 8.0 10.0 14.0 11.0 4.0 11.0 14.0 9.0 6.0 790.591413

5 rows × 36 columns

Type of code_tract in para_cntr: int64
Type of code_tract in censobr_tract_income: object
Number of census tracts with associated data: 11858
No description has been provided for this image

Now we will aggregate population by age variables according to the brazilian school levels:

  • Infantil: 3 months to 5 years old
  • Elementary: 6 to 14 years old
  • Middle: 15 to 17 years old
  • High: 18 years old
No description has been provided for this image

Step 2.2. Download Meta's High Resolution Population Density¶

Let's see all the available datasets for Brazil in Humanitarian Data Exchange (HDX).

created name population size_mb url
id
0 2019-06-11 BRA_children_under_five_2019-06-01_csv.zip Children (ages 0-5) 159.14 https://data.humdata.org/dataset/c17003d1-47f4...
1 2019-06-11 BRA_elderly_60_plus_2019-06-01_csv.zip Elderly (ages 60+) 159.42 https://data.humdata.org/dataset/c17003d1-47f4...
2 2019-06-11 BRA_men_2019-06-01_csv.zip Men 160.20 https://data.humdata.org/dataset/c17003d1-47f4...
26 2019-06-11 BRA_women_2019-06-01_csv.zip Women 160.21 https://data.humdata.org/dataset/c17003d1-47f4...
27 2019-06-11 BRA_women_of_reproductive_age_15_49_2019-06-01... Women of reproductive age (ages 15-49) 160.03 https://data.humdata.org/dataset/c17003d1-47f4...
28 2019-06-11 BRA_youth_15_24_2019-06-01_csv.zip Youth (ages 15-24) 159.79 https://data.humdata.org/dataset/c17003d1-47f4...
33 2019-06-25 population_bra_northeast_2018-10-01.csv..zip Overall population density 71.24 https://data.humdata.org/dataset/c17003d1-47f4...
34 2019-06-25 population_bra_northwest_2018-10-01.csv.zip Overall population density 13.46 https://data.humdata.org/dataset/c17003d1-47f4...
35 2019-06-25 population_bra_southeast_2018-10-01.csv.zip Overall population density 159.42 https://data.humdata.org/dataset/c17003d1-47f4...
36 2019-06-25 population_bra_southwest_2018-10-01.csv.zip Overall population density 45.96 https://data.humdata.org/dataset/c17003d1-47f4...

We will download the Overall population density dataset from HDX. This dataset contains the population density of Brazil at 30m2 resolution.

CPU times: user 287 ms, sys: 57 ms, total: 344 ms
Wall time: 2.58 s

The obtained dataset is simple and contains the following columns:

latitude longitude population_2015 population_2020 geometry
887751 -27.853472 -48.606528 0.518908 0.582712 POINT (-48.60653 -27.85347)
887752 -27.853472 -48.606250 0.518908 0.582712 POINT (-48.60625 -27.85347)
887753 -27.853472 -48.605972 0.518908 0.582712 POINT (-48.60597 -27.85347)
887754 -27.853472 -48.605694 0.518908 0.582712 POINT (-48.60569 -27.85347)
887755 -27.853472 -48.605139 0.518908 0.582712 POINT (-48.60514 -27.85347)
Number population data points for Florianopolis: 65086

Step 2.3. Aggregate Population Density in Hexagons (Uniform Spatial Units)¶

CPU times: user 101 ms, sys: 3.75 ms, total: 105 ms
Wall time: 300 ms
0
((989, 2), (989, 3))

Step 2.4. Convert Census Tract Level Data to Hexagons Level¶

Census tract level numerical variables like population and income will be divided in hexagons proportionally to the Meta high-resolution population density.

('EPSG:4326', 'EPSG:4674')
hex geometry population_2020
0 88a91bcd09fffff POLYGON ((-48.50489 -27.84191, -48.50072 -27.8... NaN
1 88a91bcd51fffff POLYGON ((-48.52335 -27.84365, -48.51917 -27.8... NaN
2 88a91b425bfffff POLYGON ((-48.46311 -27.69699, -48.45893 -27.6... 19.249137
3 88a91b5565fffff POLYGON ((-48.46642 -27.70402, -48.46224 -27.7... 9.624569
4 88a91b4d57fffff POLYGON ((-48.35799 -27.49173, -48.35381 -27.4... NaN
  0%|          | 0/651 [00:00<?, ?it/s]
hex                      0
geometry                 0
population_2020        474
pop_3_5_years_adj      474
pop_6_14_years_adj     474
pop_15_17_years_adj    474
pop_18_years_adj       474
V002_adj               474
V003_adj               474
dtype: int64

Compare the total number of census tracts and hexagons in Florianopolis

Census Tracts: 651 --> Hexagons: 989

Step 2.5. Visualize downsampled population data in hexagons¶

Let's see how the population and income data looks like in the hexagons. Visualizing data side by side is a good way to see if the data is consistent.

No description has been provided for this image

Another way to check population consistency is to compare population distribution with road network density. The idea is that areas with higher population density should have more roads.

No description has been provided for this image

We can clearly observe that the population values mostly are located around the road network. This is a good sign that our downscaling method is working as expected. Also, higher population values correspond to higher density of roads.

Step 2.6. Calculate and visualize income per capita¶

No description has been provided for this image

Step 3. Extract and process Schools geographical data¶

Number of schools in Brazil: 222936
abbrev_state name_muni code_school name_school education_level education_level_others admin_category address phone_number government_level private_school_type private_government_partnership regulated_education_council service_restriction size urban location_type date_update geometry
0 RO Porto Velho 11000023 EEEE ABNAEL MACHADO DE LIMA - CENE Ensino Fundamental Atendimento Educacional Especializado Pública AVENIDA AMAZONAS, 6492 ZONA LESTE. TIRADENTES.... (69) 992083054 Estadual Não Informado Não Não ESCOLA ATENDE EXCLUSIVAMENTE ALUNOS COM DEFICI... Entre 51 e 200 matrículas de escolarização Urbana A escola não está em área de localização difer... 2020-10-18 POINT (-63.85401 -8.75846)
1 RO Porto Velho 11000040 EMEIEF PEQUENOS TALENTOS Educação Infantil Pública RUA CAETANO, 3256 PREDIO. CALADINHO. 76808-108... (69) 32135237 Municipal Não Informado Não Sim ESCOLA EM FUNCIONAMENTO E SEM RESTRIÇÃO DE ATE... Entre 201 e 500 matrículas de escolarização Urbana A escola não está em área de localização difer... 2020-10-18 POINT (-63.88392 -8.79373)
2 RO Porto Velho 11000058 CENTRO DE ENSINO CLASSE A Educação Infantil, Ensino Fundamental, Ensino ... Privada AVENIDA CARLOS GOMES, 1135 CENTRO. 76801-123 P... (69) 32244473 Privada Particular Não Sim ESCOLA EM FUNCIONAMENTO E SEM RESTRIÇÃO DE ATE... Mais de 1000 matrículas de escolarização Urbana A escola não está em área de localização difer... 2020-10-18 POINT (-63.90199 -8.76073)
3 RO Porto Velho 11000082 CENTRO EDUCACIONAL PRESBITERIANO 15 DE NOVEMBRO Educação Infantil, Ensino Fundamental Privada RUA ALMIRANTE BARROSO, 1483 SANTA BARBARA. 768... (69) 32245636 Privada Particular Não Sim ESCOLA EM FUNCIONAMENTO E SEM RESTRIÇÃO DE ATE... Entre 51 e 200 matrículas de escolarização Urbana A escola não está em área de localização difer... 2020-10-18 POINT (-63.89618 -8.76520)
4 RO Porto Velho 11000104 CENTRO EDUC CORA CORALINA Educação Infantil, Ensino Fundamental Privada RUA MEXICO, 1056 NOVA PORTO VELHO. 76820-190 P... (69) 32252616 Privada Particular Não Sim ESCOLA EM FUNCIONAMENTO E SEM RESTRIÇÃO DE ATE... Entre 501 e 1000 matrículas de escolarização Urbana A escola não está em área de localização difer... 2020-10-18 POINT (-63.87547 -8.76863)

Step 3.1. Aggregate categorical variables in hexagons¶

To work with categorical data we have to convert from text to numerical values. To do this we will create one column for each class in the caterogical variable and fill with 1 if the school belongs to that class and 0 otherwise. This process is known as One or Multi-hot encoding.

Example of education level columns assigned to the schools:

ensino_fundamental educacao_infantil ensino_medio educacao_jovens_adultos educacao_profissional no_specified
0 1 0 0 0 0 0
1 0 1 0 0 0 0
2 1 1 1 0 0 0
3 1 1 0 0 0 0
4 1 1 0 0 0 0

Example of administrative category columns assigned to the schools:

publica privada
0 1 0
1 1 0
2 0 1
3 0 1
4 0 1
(7213, 27)
((351, 27), (351,))

Once aggregated the categorical variables, we can visualize the number of schools per hexagon. We will overlay the schools points to see if the aggregation is working as expected.

No description has been provided for this image

Schools distribution seems to be consistent with the population distribution. This is a good sign that our aggregation method is working as expected.

Step 3.2. Aggregate numerical variables in hexagons¶

In this official dataset provided by IBGE, we don't have the enrollment data. We will use the enrollment data from the a previous schools dataset to estimate the number of students per school and hexagon.

Restrição de Atendimento Escola UF Município Localização Categoria Administrativa Dependência Administrativa Categoria Escola Privada Conveniada Poder Público Regulamentação pelo Conselho de Educação Porte da Escola Etapas e Modalidade de Ensino Oferecidas Outras Ofertas Educacionais LAT_FINAL LONG_FINAL Matriculas totales Matrículas oficiales
0 ESCOLA ATENDE EXCLUSIVAMENTE ALUNOS COM DEFICI... EEEE ABNAEL MACHADO DE LIMA - CENE RO Porto Velho Urbana Pública Estadual Não Informado Não Não Entre 51 e 200 matrículas de escolarização Ensino Fundamental Atendimento Educacional Especializado -8.758459 -63.854011 159.0 94.0
1 ESCOLA EM FUNCIONAMENTO E SEM RESTRIÇÃO DE ATE... CENTRO DE ENSINO CLASSE A RO Porto Velho Urbana Privada Privada Particular Não Sim Mais de 1000 matrículas de escolarização Educação Infantil, Ensino Fundamental, Ensino ... NaN -8.760734 -63.901986 1433.0 1433.0
2 ESCOLA EM FUNCIONAMENTO E SEM RESTRIÇÃO DE ATE... CENTRO EDUCACIONAL PRESBITERIANO 15 DE NOVEMBRO RO Porto Velho Urbana Privada Privada Particular Não Sim Entre 51 e 200 matrículas de escolarização Educação Infantil, Ensino Fundamental NaN -8.765205 -63.896177 59.0 59.0
3 ESCOLA EM FUNCIONAMENTO E SEM RESTRIÇÃO DE ATE... CENTRO EDUC CORA CORALINA RO Porto Velho Urbana Privada Privada Particular Não Sim Entre 501 e 1000 matrículas de escolarização Educação Infantil, Ensino Fundamental NaN -8.768633 -63.875471 636.0 636.0
4 ESCOLA EM FUNCIONAMENTO E SEM RESTRIÇÃO DE ATE... CENTRO EDUCACIONAL MOJUCA RO Porto Velho Urbana Privada Privada Particular Não Sim Entre 201 e 500 matrículas de escolarização Ensino Fundamental, Ensino Médio NaN -8.765028 -63.891857 290.0 290.0

Separate matriculas by education level to calculate the number of students per hexagon separting by education level and target student population.

ensino_fundamental educacao_infantil ensino_medio educacao_jovens_adultos educacao_profissional no_specified
0 1 0 0 0 0 0
1 1 1 1 0 0 0
2 1 1 0 0 0 0
3 1 1 0 0 0 0
4 1 0 1 0 0 0
No description has been provided for this image

This maps shows the difference between the number of students and the target student population for each education level.

  • Blue hexagons have negative values and represent areas where there are more students than the target population.
  • Red hexagons have positive values and represent areas where there are less students than the target population.

Given this we could say that:

  • Blue hexagons are city centers were there is less population but have a lot of students that come from other areas.
  • Red hexagons are most present in rural areas were there is more target student population that have to travel to other areas to attend school.
  • This second group could be in danger of not attending school due to the distance.

Step 4. Assing the hexagon zone (rural or urban)¶

We will use the census track variable to determine the urban area of Pará.

No description has been provided for this image

We can see that the urban areas are correctly identified. Considering the satellite image, urban areas are the ones with more buildings and roads.

Summary¶

We have extracted and processed schools and sociodemographic data from multiple source. The final results is a dataset on which each row is a H3 Hexagon. This will allow us to compare and analyze different microregions in Para. The variables obtained are the the following:

Source Variable name Type Description
Meta's HRSM population_2020 Numerical Total Population
IBGE's Census* pop_3_5_years_adj Numerical Population between 3 months and 5 years old
IBGE's Census* pop_6_14_years_adj Numerical Population between 6 and 14 years old
IBGE's Census* pop_15_17_years_adj Numerical Population between 15 and 17 years old
IBGE's Census* pop_18_years_adj Numerical Population between 18 years old
IBGE's Census* V002_adj Numerical Total Population
IBGE's Census* V003_adj Numerical Income (R$)
INEP's School Census ensino_fundamental Numerical Ensino Fundamental Schools Count
INEP's School Census educacao_infantil Numerical Educacao Infantil Schools Count
INEP's School Census ensino_medio Numerical Ensino Medio Schools Count
INEP's School Census educacao_jovens_adultos Numerical Educacao Jovens Adultos Schools Count
INEP's School Census educacao_profissional Numerical Educacao Profissional Schools Count
INEP's School Census no_specified Numerical Count of schools with unspecified type
INEP's School Census publica Numerical Public Schools Count
INEP's School Census privada Numerical Private Schools Count
Custom School Census** Matriculas totales Numerical Total Enrolled Students
Custom School Census** Matrículas oficiales Numerical Oficially Enrolled Students
Custom School Census** matric_total_[school type] Numerical Enrolled Students by School Type
Custom School Census** matric_oficial_[school type] Numerical Enrolled Students by School Type
Custom School Census** diff_pop_matric_total_[school type] Numerical Difference between target student population and enrolled students by School Type
Custom School Census** rate_pop_matric_total_[school type] Numerical Ratio between target student population and enrolled students by School Type
IBGE's Urban Footprints urban_area Categorical 1=Urban or 0=Rural
Ours income_pc Numerical Income Per Capita (R$)

* IBGE's Census: Variables where downscaled proportionally to Meta's high resolution population data.
** Custom School Census: This dataset was provided by IDB Education team in Brazil (need to ask for the source)